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The Piagetian literature of the Sixties abounds with confusions. Was 
the original Genevan research replicable? If so, were the criteria used 
'■ vO ' tKe mode of testing nevertheless too subjective? If replicable, was ^ 

the construct underlying, the different Genevan research problems unitary? 
if unitary, what was the status of the underlying modial? If subjective, 

f . c ■__ _ __ _ _ ___ __ 

then by what process can the theory, if unitary, be objectified? Each of^ 
these questions points to the necessity of standardising the original 
Genevan procedures, and economy of research effort points to carrying this 
out in the form of group tests. 

Yet, by the Seventies, the problem of ' psychometrising Piaget' 
. (Tuddenham, 1970) had still not been solved. Although this was due in part ^ 
to the confounding of the above questions, it was due also to the Piagetian ^ 
data not fitting the conventional data-analysing methods. To carry but a 
survey of Piagetian developmental norms oh a sample large enough, to generalise 
from (Shayer, KUchemann & Wylam, 1976; Shayer & Wylam, 1978), and to carry , 
out a study of the validity of Piaget' s construct of formal operational 
thinking (Shayer, 1979) it was necessary to produce a solution. This paper 
could be sub-titled "'The author's misfortunes in the wilderTness of ~te"s t- 
theory". One way out of that wilderness will be sketched, and illustrated 



in the construction of one of the Piagetian tests required. * . 

^ ESoth the Piagetian model and the procedure for reporting research 

findings are quite different from those developed by the psychometric tradition 
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^bf norm-Tefereticed tests. Although the former involves the repbiffing; of a 
'wide range of behaviours, in their categorisation thesie are collapsed under 
a small number of global descriptors. Early and Late Concrete (2A & 2B), or 
Early and Late Formal (3A & 3B). The latter imply a large number of 
behaviours, each represented by test items scored on a pass/fail basis; and 
compared with each other in test construction by item-analysis techniques 
which use the performance of lar;ge samples of subjects as tests of the items' 

C'_ __ 

validities and reliabilities. This typically results in an equal-interval 
scale with about sixty enumerable intervals i The advantage of this is to 
make the test estimates amenable to measurement theory: the disadvantage of 
the Piagetian clinical interview is that only the overall global assessments 
can be compared quantitatively. In many papers reporting low correlations 
between Piagetian tasks there is a curious innocence with regard to the 
reliability of the interview estimate or, indeed, to simple quantitative 
considerations such as whether the sample rang^ is wld,e enough for the 
calculated correlation to be a true measure of the underlying association. 

Questions of test procedure required to make a group-test situation 
equivalent to an interview, for the subject, are discussed in Shayer, Adey 
and Wylam (1980) and so will not be referred to here except to note that 
demons tra t i bti > feedback to the subj ec t related to his own ideas, and flexible 
verbal commutliGa tl^ between test administrator and group are essential. 
Taking these for granted, the essence of the problem for the test constructor 
is to represent!^ all the behaviours mentioned in the original Genevan research 
as characteristic of one or another stage or sub- stage by^ test-items. In 
this way, as a first step; the validity of the ascription of behaviour to 
Stage can be checked objectively by the performance of a suitable sample of 
children. But there is a problem. In the Genevan descriptions of behaviour 
sometimes the problem can be solved at a particular developmental stage. In 
this case a person who has higher levels of thinking at his disposal will- 
always solve the problem. Suppose a test- item is only solved at the Early 
Formal (3A) level, but that subjects show consistent fallacious responses to 



the item when they possess Late Concrete (2B) competence', and which 
differentiate such subjects from Early Concriate of lower stage subjects who 
show different fallacious responses. If the stage-behaviours are scored, 
dichbtombusly (1,0), then either one i6 forced to lower the subject's score 
oh the 2B items when he succeeds at the 3A stage by scoring* it only as a 
3A item, or one introduces false correlation into the test content by 
automatically adding a 2B score to the subject's sub- test totalsl if he shows 
the higher level 3A behaviour which sblv«es the problem. To avoid the 
dilemma it is necessary to construct tests out of items each of which is 
scored only for success j and categorised at a particular level. By 'success* 
is meant 'true in reality' eigi that Length is^ one of the variables affecting 
the period of a Pendulimi (2B) or that to find that Weight is not effective 
the valid experimental method is to keep length and push constant, and take 
just two different weights (3A), To find items to test lower level 
competencies one looks for aspects of the problem(s) to which they are adequate 
Thus for 2B items in Equilibrium in the Balance one chooses 2 : 1 or 3 : 1 
ratios of weights or lengths from. pivot. (Iiihelder <S Piaget, 1958). Thus 
each item will be labelled with the minimum stage required for success on it. 
Such a method of test-construction will simultaneously be true to the 
hierarchical developmental theory of Piaget, and at the same time allow of an 
experimental test of the"* validity of the theory^ If the theory is not true 
the later stage- items will not scale with the earlier stage itemsi Moreover 
the test-items are now amenable to all the usual item-analysir: techniques ^ 
including correlation methods such as factor-analysis. In this way one can 
bring the cons tructivis t theory of Pj.aget| which relates developing mental 
structures to the complexity of the relationships which they enable the 
subject to discover or impose upon the world, iiito contact with a test-theory 
and method of test-construction more usually associated with an empiricist 
or behaviourist approach to the increase of intelligence. 



THE CONSTRUCTION OF A PIAGETIAN TEST 

■ • . 

Plscrlmlnatlbn diagrams 

The argument, of the previous section may be more easily appreciated in 
the example of the development of a particular Plagetiati test (NFER, i979)* 
This was constructed for the purpose of estimating the Piagetian stages of 
children over a rather wide range - from Early Concrete to Early Formal 
operational. The subject matter was taken from The-Childls-Cons^truc4:lon 
of -Quantities (Piaget & inheider^ 1974), and traced all the steps by which 

the child is eventually ^ able to conceive of the density of substances as a 

: \ .. .. ... ....... _ .o . . ..... _. ... . . . 

weight/volume relationship i It was necessary to find some problems which 

are solved successfully by children at the early stage of development of 

concrete operational thinking (2A); some which are solved at an ihtermedifiite 

level (2A/2B); -others which are rarely solved until the child' possesses the 

whole structure of mental ojperations which Piaget describes as Late or mature 

concrete operational thinking (2B), and, finally, items which aire not. solved 

successfully (in the sense of trueness to reality) until the formal operationaJL 

istage (up to 3A) . The concepts involved are listed in Table 1, 



Insert Table 1 here 



Be<Sririg in mind that the purpose of such tests is to estimate as precisely as 
possible the optimum present level of thinking which a child possesses (rather 
than making a random sample of his strategies) it is obviously necessary to 
choose problems which give a sharp signal. By choosing several non-redundant 
problems for each level the signals summate so as to increase the precision 



of estimate. From an item-analysis point of view ^th is itie axis that facility 
is not the ottly cl^aracteristic of atil itiem iti which one is interested. The 
discriminatidn of an item misasures this sharpness of its signal. '■ Unfortunately 
it was soon found that the conventional discrimiflation indices do not give 
enough information as to the way in which the item behaves in the test 
contexts One needs to know how welt the item differentiates between a given 
level and those iramediately below, irrespective of whether, for a given 
pbpulatioji sample, it happens to have a 50% -facility , or a 16 or 90% facility. 
For this purpose the whole test sample, and all the test items may be used to 
examine the performance- of each itemi — ' • . 

First the test items are grouped according to levels, and each subject 
given an overall level assessment based on a 2/3 - success principle. Thus 
if there are three 2A items, Thd the sul^ject succeeds on at least 2^ he Is 
capable at least. of Early Concrete thinking. ^ If he fails to reach the 2/3 
criterion on any higher group of items, then he is assessed at the 2A level. ; 
if he succeeds on at least 2/3 of the 2B items also, he is assessed 2B, and 
so on. Then, for each item, the percentage of the subjects assessed overall 
as 2A who succeed on the item is calculated. The calculation is ^repeated 
for the 2B subjects, and for the subjects assessed overall at each of the 
other levels. Such a discrimination diagram for an item in Volume and 
Heaviness is given iti Figure 2. * 

^ Figurj^ 1 about here 



Sti^h a method allows *direct inspection of *lhe item's discrimination 
characteristics. Thus one item may be compared with another; with a fresh 
^aiiiple changes iti the presentation or wording of ati item may be compared^ 
and slight changes in the scoring rules used to assess the overall level of 
the subject on the test may be compared with each other. The purpose of all 
the changes would be to increase the precision of each item, which is. 
measured directly by the abruptness of the ogive. the discrimination level 

of the item can also be accurately gauged by the centre point of the ogive i 

/ . ._. ._. __ . 1 . . ♦ . 

It may be remarked here that an empiricist skill-integrationist account 

of intellectual development can be distinguished exper'imentally from the 

Piagetian account of developing general structures by such test-analysis. 

The former should give gently increasing discrimination diagrams, since t!ie 

particular order in which a given child would develop par ticular skills would 

depend on the accidents of their experience. As they get older or brighter 

the probability ^would merely increase that ,any child would have achieved a 

concept. ^ On the Piagetian account an invariant sequence, dependant on the_ 

hferarchical development of mental structures, would give diagrams with sharp 

ogives, since if a child possessed a given structure there would be a very 

high probability that he would solve all tasks requiring that structure. 

Scalability, unidimehsionali ty 4 and the L^oevinger test- theory 

It is curious that the elegant^ subtle and powerful critique made by 

Jane Loevihger (Loevihger 1947; 1948) of current test theory in the Forties, 

,ahd the new methods which she described, should have featured so little in 

the research literature. Perhaps it is a rare example of a data-processing 

method developed in advance of its time, when no problems existed whose 

theoretical model required such analysis- techniques . Guttman scalogram 

analysis, though vigorously criticised^ has fared better. One can cite, the 

ultimate accolade of its presence in the SPSS p^ckag^. But the reason for 

this is that it was conceived in response to attitude variables, whose implied 

_ ^ ' ■ ... 

._: ■" y V ■ . . ., ^ 



' underlying model is. strict scalability* It has been widely used in 

sociology rather than psychology. A bipolar v.ariable^ such as xenophobia 
(and xehophilia) should be expiected to scale, since feelings and attilyudes 
are usually untftedi But to place a subjiect somewhere oh a xenophobia- 
xenophiiia scale implies a different wodel from that underlying any account 
of developing intetligencei Even the Piagetiati account, which seems to come 

closest to implying scalability, differs importantly from attitude variables 

■ .. . ^_ : _ ' , 4/ _ 

in that no successes of an earlier stage' are lost with the development of a 

. later stage. For example, simple cause and effect thinking such as the 

connection between the weights on a spring and its extension would still be 

used by a person capable of Late Formal thinking ,. because such bate Goncrete 

thinking 'is perfectly adequate to ^he relationship in question^ Scalogfam 

o • ' . 

analysis would seem best to fit data in which there are gradual qualitative 
changes in behaviours or strengths of feeling over the whole scale. 

toevinger's approach ran parallel to the development of factor-analytic 
methodsi It was, in part, an attempt to develop a method of test-analysis 
which would ensure that the test actually measured something. ^ Rather than 
produce composite intelligence tests, and find out afterwards by factor- 
analysis which set o£ abilities are estimated by the different items, she 
announped that it was better to start with a theory which should impose a 
unified construct on a test, select test items in accordance with the theory^ 
and then lise her own appropriate method of test-analysis to improve the tests. 
Yet to the author's knowledge this approach was never actually used. If 
there is a well-defined mental construct^ then ifshould be possible to 
measure ''increasing development of it, by subjects. A uni dimensional test 
derived from the construct should, of cQurse, be uni-factbr (Lumsden, 1961) . 
B ut as will be di s r.u^se d_J^±ef , t here are technical reasons'^ why factor- 
analysis may not give a clear decision i^ere test-item data cover a very 
wide range of mental functioningi Loevinger's definition of test homogeneity 



allows the functioning of each item to be inspected directly; Each Item 



• • should be related to another item vrtiich tests a greater degree of achievement 

. - " • • - ■ . - ■ ■ i 

: of the anderiying construct by^the relationship 'if tfee latter^ then the former' 

She developed three indices which quantified this relationship for one item • 
in relation to another (H^j),for an item in relation to the test as^a whole, • 
(H^^)^ and for the degree of homogeneity of the test (H^), Again, it will 

^ ^ be s^en later that her vatious H indices encounter a technical problem related 

■ i - - _ - - -■ - - _ - - _ _ - - _ - - _ . - • - _ 

/ to the ' difficulty- factor ' problem in factor-analysis • But her principle of 

. / ■ ' - • ^ -- - - - ■ - "- - - - - ' - 

/ test-analysis is unaffected by this, arid i3 obyidusly a close fit to the 



Piagetiatl model. Both Nassefat (1963) and Gdldschmid and Beritler (1968) 
. have used it on Piagetian data. The discrimination- level diagrams referred 
to earlier are obviously closely related to the Loevinger analysis, A sharp 
ogive will mean a high homogeneity within the cpntext of the test a^s a whole, 
. and a set of items of different facilities, each with sharp discrimination, 
will define experimentally a unldimensionai construct. Thus we have a method 
of test-analysis which should suit Piagetiandata if the Piagetian model is 
valid. Figure 2 gives the complete se^t of discrimination diagrams for the 
test, Volume and HeiiLviness, determined on a representative sample of 12 year 
olds. ... f 



Fig. 2 about here 



PROBLEMS WI TH- FA C T OR^ANALY ST S " 



Attempts to test for iinidiihensionality by factor-analysis run foul of 
the 'diff iciilty- factor ' problem (Ferguson, 1941) on data such as this. One 

__• / . - -- '- ' 

can factor- analyse the items within a composite Piagetian test^ and fihd 

oneself with, not one factor as the Piagetian model requires, butt a 

'Concrete factor' and a 'Formal factor' and even possibly ah thtermediate 
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•transitionai foraai fa.ctor* as well (tawson 1978), Factor-analysis as •• 
.: a biack-box tool is really 'best used on data all of which are around the 
. 50 per cent facility level for the sAnpie choseni It will-then accurately 
>>^differentiate ttie cbrrelatibh matrix into the number of factors required 
to explatin the. data. The reason for this has most clearly been shown by 
Carroll (1951). ^Factor-analysis is a process of grouping of the ceils of' 
a correiatioh matrix. If the correlation coefficient used is Pearson r, 
or the phi-coefficient which is the form it takes for dichotomous da ta, then, 
the maximum value it can take is limited by the degree of overlap in the two- 
dimensional matrix of the variables. Ferguson showed that if a set of items 
span a facility rang^ from, say, 10 to 90 per cent, the correlation matrix 
will split into at least three sets, even though the 'truia' correlation 
betV7een ail the, items is the same. items with facilities in the ^me range 

may attain a value of nearly 1, if perfectly correlated, but when correlated 

■ J, _ — 

V7ith items in a different facility range may be limited to a maximum of 0,5 
or less^ and will be lowered proportionally if less than perfectly correlated.' 
• Thus the factor-analysis procedure can^ produce several factors from a uni- 
dimensibtial set of* items. ' 

A partial solution to this problem was offered by- Bentler (1971) uhdter 

the name of Mono tonicity Analysis, arid has. ^been used both by him (Goldschmid & 

- . * _ _ _ . - _ J 

Bentler, 1968) and Hooper and Dihoff (1975) in the analysis of Pi^getian data. 

In effect the method involves changing the association index to one wliich 

does not drop * when the facility of items varies. The index -he used is one 

- • _ • - _ _- __ __ 

pifoposed by Yule in 1912, which was developed by *Yule in response to an 

analogous problem^where use either of Pearson r or of te^trachoric r produced 

either negative or positive distortion of the association relationship-^hen 

the marginal values differed widely. Yule's Y or omega index (Bentler's 'm' 

reduces to Y, for dichotomous data) was conceived to yield, as nearly as 



TABLE 2 

VOLUME AND HEAVINESS: PRINCIPAL COfffONENT§ 
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Component loadings after Varimax rotation 
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pbssibie, the ^same value which the phi-coefficient would have given for th^ 
data-iina,trix if it had been cut to give 5Q/50 marginal values i Even this 
coefficient distorts when the value of one of the four correlation ceils 
idrops ^nearly to 0^ but it does so less than any coefficient, including 
Yules Yule showed that the sampling variation for this coefficient is 

•iess. than "^that of any of the previously named variabl:esi The reason for ' 
, this can be seen in Carrol's diagrams, as can be also seen the technical 
reason why toevinger's H (this can be shown to be identical to phi/phi--^ ) 
cannot bemused as a solution to the 'difficulty- factor' problem. It, like 

tetrachoric r, distorts sbrongly, but positively, when trie " correlation matrix 

. . _ ._ _ _ * . _ A . . . • . 

is cut at extreme values; It therefore also leads to 'difficulty- factors' - 



but in this case by associating items ^bf widely d4f£eririg facilities. 

Technically, the procedure is to use a simple principal components 
..analysis programme on a matrix of Yule's Y coefficients. It is easy to 
write a jfrogramme to compute Yuie'sXY, and insert it in the SPSS PA 1 
progranSe in ^iace of the phi-mati^ix.^ When this was carried out for the 
Volume and Heaviness task^ it produced a . two- component solution^. as given 
in Table 2. 



Insert Table. 2 here 



•;Howrever^ it has: to be admitted that all this' is stretching the factor-analysis 

_ ' *^ _ _ . . 

procedure to the analysis, of hierarchical /data to which it is not really suited. 



the problem is that the Gbmponent 2 loadings are oh high facility items. 
Even Yale's Y cannot pick up a relationship when thex:e is virtual-iy no 
overlap in the datai As will be seen later items 1 to 3 can be accommodated 
quite well within the context bjE the bverail test construct;. What^ is 
required is a data-analysing technique which riapresents all the information 
in the test;s:data. 

REiRESEHTATlQN-Qi^^EST^ CONTEND 
- f 1 bg g i ng ^ wal 1 ' 



in a powerfully argued review (Lumsden, 1976) Lumsden playfully offered 
a concrete metaphor to try to elucidate some of the paradoxes and fallacies 
involved in eight years and more of the psychometric test literature. The 
» same kind of conclusion was reached much earlier in a, penetrating review 
' called 'The Attenuation Paradox' by Jane Loevinger. She showed that there 
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was only one kind of item distribution (Loevinger 1954* Fig. 3) in which the 

■ - - - '■- 

validity did not decrease with increasing reliability:- that in which the 

items had rectangular distribution bf item facilities. Lumsden's 'flogging- 
wall* suggests an explanation for this. He suggests an analogy between 
the test- estimation process, and an attempt to estimate the height of a 
subject by drawing him> past a wall out of which are waving canes each at a 
different height, but some flogging ^up and down with different amplitudes; 
The subject^'s height is is tifiia ted by the number of cane-cuts he accumulates 
in the course bf his trial. By analogy^ the discrimination and spacing of 
the items in a test ^ould be such as to compromise between leaving no gaps, 
and ensuring that at each level of the test there are enbugh overlapping 
items to increase the precision of estimate. The discrimination shoUj.d 
not be too coarse br the reliability, or precision of estimate, wili be low. 

This metaphor suggested a method of representing , the three impbrvtant 
as{?ects Qf items in two dimensional space. As Loevinger (1954, p503) pointed 
out> one needs a quantitative estimate of the parameters of facility , 
discrimination level s and discrimination pover , or J4nenes£. From the 
discrimination diagrams used earlier one may, by imposing an equal-interval 
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- Volume and Heavinisa C^Task JII^NFER 1979) 

r 

: , ^t—^ 

^ to oj^ igA _ . . 

g I (?) ^- ^ 

a ■■■ — dfcfctfyiep ^ (D >^ /NTvmvE 



50 

O 



2b 



40 



f 2 A 2n/26 2B ZB/3/» 5 a 3b 

PIAGETIAN LEVEL (Subjicti) 
241 9-12 year-olds. Mixed. Slightly above average sample. 
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scale onto the Piagetian levels used^ estimate the Plage tiah liBvels of the 
subjects who, successively^ show 257., 677.1 and 757. success on an item; Note 
that these are not facility levels. The* 677. level is that at which 67 percent 
of the subjects assessed by the test overall at-that^levet pass the item. In 
effect, this is the discr^minatidri Uvel of the item. In Figure 3 the facility 
-of the items is plotted against a line spanning the 25, 67 & 757. levels for 
each item. The length of the item line is an inverse measure of the 
unidimensionality of the item in the test context and estimates discrimi nation 
power , it will be seen' that alt , items discriminate with a satisfactory sharp- 
ness, with the exception of item 3b,eohservation of Weight, and item 11, 
intuitive Density. This, it is true, was reflected in the low communali ties 
of these two items in the factor- analysis, but the significance is clearer in 
this diagram. In the case of item 11 one must say "Wat it is not as closely 
related to the overall developmental construct as are the other items. Item 
3b looks as though it s!ibuld\ascriminate at a lower level, but there is 
clearly some aspect of the weight conservation problem ( this is of a grain 
of com being /popped' by heat) which renders the facility less and the 
discrimination level higher than one might have expected. This points to some 
deficiency not picked up earlier in the formulation of the item itself. 
Further Piagetian research questions are obvious^ but it is not the purpose 
of this paper to explore them. c 

• Such a diagram does most of the work in estimating 
the unidimensionality of a test, arid has the advantage both of representing 
the discrimination levels of the items and their spacing within the test, 
and also of pin-pointing the departures of any items from the overall . test- 
construct, it will not work in reverse, of course^ A multi-dimensional test 
would have item- lines all stretching widely across the test-space. Only 
factor-analysis would indicate how many factors were involved. But where, 
as in Piagetian studies, it is a unidimensional construct one is attempting 
to explore, such a diagram does represent all the parameters which Loevinger 



was attempting to characterise in test-constractibni and provides an Overall 
check on the test construct which can suggest iriBiiediate remedy. 



A 67% success criterion was originally taken for pedagogic reasons; It 
seetned a r^aVoM stringent proportion by whi.ch to tell whether a pupil 
understood the basic P^^lnctple underlying several item^ ^^^^^^i ^5"*^ 
science concept. Subsequently it wasfound empirically to be the cutting 
level which gave the best scaling of groups of items which differed widely 
it? facilities, and which were expected to scale at several different levels, 
There may be a good technical reason for this, but it is not known to the 
author. ' ^ ' 
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TABLE 1 

CONCEPTS IN Vbttilffi AM) HEAViNESS TEST 



Concepts 


Piagetian level of concept 


Conservation of substanci (Mass) 




Early Concrete 


Ihterhal voltjme and intuitive 
density '(Heaviness) 


2A/2B 

r 


• 


Conservation of Weight and 
occupied volume 


2B 


Late Concrete 


Displacement volume 


2B/3A 




Density as a weight to 
volume relationship . 


3A 


Early Formal 



TABLE 2 

VOLUME AND HEAVINESS: PRINCIPAL COMPONENTS 



Item . 


Component loadings 


after Varimax. rotation 


■Component 1 


Component '2 


i 




74 


2 




69 - 


3a 


57 


36 


3b 




43 


5 


68 




6 


54 




7 


61 




8 


67 


30 


9 


66 




10 


67 




11 


34 




12 


50' 




13a 


57 




' i3b 


67 




14 


38 





Plg.l Volume & Heaviness. Item 9 
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Biscrtminatton-'ievel dtagrams for questions in 
_ Task Volume and Heaviness 
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Voimne and" Heavtness ( Task ii,NFER L979) 
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_ PIAGETiAN LEVEL (Subjects) 
241 9-12 year-olds^ Mixed. Slightly above average sample. 
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